fix(assistants): reconcile MCP server set on every turn#3013
Conversation
The runner only saw the MCP server list its initial bootstrap returned; toolset edits made after that point (e.g. attaching GitHub MCP) never reached a live runtime and the model could not list, call, or mcp_force_reconnect those servers. Server now resolves the current MCP set from the assistant's toolsets on each /turn and includes it in the dispatch body. The runner forwards it to a new McpCmd::Reconcile against a per-thread known set: new servers get registered + connected (with assistant_mcp_auth_required notices injected via the thread inbox on AuthRequired), removed servers get disconnected. The mcp_force_reconnect tool's server_id schema enum is dropped — it was frozen at bootstrap and lied as soon as reconciliation landed.
There was a problem hiding this comment.
Claude Code Review
This repository is configured for manual code reviews. Comment @claude review to trigger a review and subscribe this PR to future pushes, or @claude review once for a one-time review.
Tip: disable this comment in your organization's Code Review settings.
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
🦋 Changeset detectedLatest commit: a83ee4f The changes in this PR will be included in the next version bump. This PR includes changesets to release 1 package
Not sure what this means? Click here to learn what changesets are. Click here if you're a maintainer who wants to add another changeset to this PR |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: d231e6e8a9
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| if let Err(err) = manager.disconnect_server(&server_uid).await { | ||
| tracing::debug!(server_id = %id, error = %err, "reconcile disconnect (ignored)"); | ||
| } | ||
| ctx.known.remove(&id); |
There was a problem hiding this comment.
Remove detached MCP servers from reconnectable state
When a server is removed during reconcile, this code only calls disconnect_server and then deletes the ID from ctx.known, but it never removes the server config from McpServerManager. Because ForceReconnect later calls connect_server by ID against manager configs, a previously detached server can be reconnected and exposed again if the model/user reuses that stale ID. Since that ID is no longer in ctx.known, future reconciles also won’t clean it up, so detached integrations can persist unexpectedly.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Fixed in a83ee4f: known now records only servers that connected (or that surfaced an auth notice). Transient transport failures stay out of known and the next /turn reconcile retries the connect.
| ctx.known.insert(server.id.clone()); | ||
| let server_uid = McpServerId::new(server.id.clone()); | ||
| if let Err(err) = connect_and_log(manager, &server_uid, "reconcile_add").await |
There was a problem hiding this comment.
Retry failed MCP additions on later reconciles
New servers are marked as known before connection succeeds, so a transient non-auth failure in connect_and_log permanently suppresses future auto-connect attempts: subsequent reconciles skip the server due to ctx.known.contains(...). This means a newly attached integration can remain unavailable for the lifetime of the thread unless manually force-reconnected, which defeats the per-turn reconciliation behavior under flaky network/startup conditions.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Fixed in a83ee4f: ForceReconnect now rejects any server_id that isn't in ctx.known. agentkit-mcp has no unregister API so the config does linger, but the guard means a stale id can't be resurrected via mcp_force_reconnect.
🚀 Preview Environment (PR #3013)Preview URL: https://pr-3013.dev.getgram.ai
Gram Preview Bot |
Address Codex review feedback on PR #3013: - Insert into `ctx.known` only after a server connects cleanly or returns AuthRequired. A transient transport failure now leaves the entry out of `known` so the next /turn reconcile retries instead of suppressing the integration for the rest of the thread. - Gate `McpCmd::ForceReconnect` on `known`. agentkit-mcp has no unregister API, so a detached server's config lingers in the manager; without this guard a stale id could be resurrected by the model invoking `mcp_force_reconnect` against a server the user has already removed.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: a83ee4f894
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| ctx.known.insert(server.id.clone()); | ||
| if let Some(notice) = create_auth_notice( |
There was a problem hiding this comment.
Retry auth-required servers when auth notice creation fails
In reconcile_servers, the auth-required path marks the server as known before the auth notice is successfully created and delivered. If create_mcp_auth_flow fails transiently (or the inbox send fails), subsequent reconciles will skip this server because of the ctx.known guard, so no later turn can re-emit assistant_mcp_auth_required and the integration can remain unusable for the thread.
Useful? React with 👍 / 👎.
| if let Err(err) = manager.disconnect_server(&server_uid).await { | ||
| tracing::debug!(server_id = %id, error = %err, "reconcile disconnect (ignored)"); | ||
| } | ||
| ctx.known.remove(&id); |
There was a problem hiding this comment.
Preserve detach retry when MCP disconnect fails
The removal loop drops the server ID from ctx.known even when disconnect_server returns an error. In failure cases this prevents future reconciles from retrying the detach (the ID is no longer tracked in known), so a server that failed to disconnect can linger in stale state for the lifetime of the thread.
Useful? React with 👍 / 👎.
Summary
/turnand sends it in the runtime dispatch body.McpCmd::Reconcile: registers + connects newly attached servers (surfacingassistant_mcp_auth_requirednotices through the thread inbox onAuthRequired), disconnects servers no longer in scope.mcp_force_reconnect'sserver_idenum is dropped — it was frozen at bootstrap and went stale the moment reconciliation landed.Previously a newly attached integration (e.g. GitHub MCP) stayed invisible to the running assistant until the runtime was restarted, leaving the model unable to use it or to invoke
mcp_force_reconnectfor it.Closes AGE-2516
✻ Clauded...